arXiv: 1501.00594v2 [cs.SI] 16Jun2015 


Stochastic block model and exploratory analysis in signed networks 
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Department of Computer Science, City University of Hong Kong, 83 Tat Chee Avenue, Kowloon, Hong Kong 

We propose a generalized stochastic block model to explore the mesoscopic structures in signed 
networks by grouping vertices that exhibit similar positive and negative connection profiles into the 
same cluster. In this model, the group memberships are viewed as hidden or unobserved quantities, 
and the connection patterns between groups are explicitly characterized by two block matrices, one 
for positive links and the other for negative links. By fitting the model to the observed network, we 
can not only extract various structural patterns existing in the network without prior knowledge, 
but also recognize what specific structures we obtained. Furthermore, the model parameters provide 
vital clues about the probabilities that each vertex belongs to different groups and the centrality of 
each vertex in its corresponding group. This information sheds light on the discovery of the networks 
overlapping structures and the identification of two types of important vertices, which serve as the 
cores of each group and the bridges between different groups, respectively. Experiments on a series 
of synthetic and real-life networks show the effectiveness as well as the superiority of our model. 

PACS numbers: 89.75.Fb, 05.10.-a 


I. INTRODUCTION 

The study of networks has received considerable atten¬ 
tion in recent literature m- This is mainly attributed 
to the fact that a network provides a concise mathemat¬ 
ical representation for social m, technological [6], bio¬ 
logical [7H5] and other complex systems m in the real 
world, which paves the way for executing proper analysis 
of such systems’ organizations, functions and dynamics. 

Many networks are found to possess a multitude of 
mesoscopic structural patterns, which can be coarsely 
divided into “assortative” or “community” structure 
and “disassortative” or “bipartitie/multipartite” struc¬ 
ture m El. In addition, other types of mesoscopic 
structures, such as the “core-periphery” motif, have been 
observed in real-life networks as well. Along with these 
discoveries, a large number of techniques have been pro¬ 
posed for mesoscopic structure extraction, in particular 
for community detection (see, e.g. [8] IT0UT3] and recent 
reviews mmy Most, if not all, existing techniques 
require us to know which specific structure we are looking 
for before we study it. Unfortunately, we often know lit¬ 
tle about a given network and have no idea what specific 
structures can be expected and subsequently detected by 
what specific methods. Biased results will be obtained 
if an inappropriate method is chosen. Even if we know 
something beforehand, it is still difficult for a method 
that is exclusively designed for a certain type of meso¬ 
scopic structure to uncover the aforementioned miscel¬ 
laneous structures that may simultaneously coexist in a 
network or may even overlap with each other [8J ITCHED] . 

To overcome these difficulties, a mixture model HU, a 
stochastic block model .211 and their various extensions 
and combinations [33ti3T| have been recently introduced 
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to enable an “exploratory” analysis of networks, allowing 
us to extract unspecified structural patterns even if some 
edges in the networks are missing [381 E2S] ■ By fitting the 
model to the observed network structure, vertices with 
the same connection profiles are categorized into a pre¬ 
defined number of groups. The philosophy of these ap¬ 
proaches is quite similar to that of the “role model” in so¬ 
ciology m individuals having locally or globally anal¬ 
ogous relationships with others play the same “role” or 
take up the same “position” [3T]. It is clear to see that 
the possible topologies of the groups include community 
structure and multipartite structure, but they can be 
much, much wider. 

One common assumption shared by these models is 
that the target networks contain positive links only. How¬ 
ever, we frequently encounter the signed networks, which 
have both positive and negative edges, in biology [121 [33], 
computer science [33], and last but definitely not least, 
social science [33H37] . The negative connections usually 
represent hostility, conflict, opposition, disagreement, 
and distrust between individuals or organizations, as well 
as the anticorrelation among objectives, whose coupled 
relation with positive links has been empirically shown 
to play a crucial role in the function and evolution of the 
whole network [331 [37]. 

Several works have been conducted to detect commu¬ 
nity structure in these kinds of networks. Yang et al. [33] 
proposed an agent-based method that performs a ran¬ 
dom walk from one specific vertex for a few steps to mine 
the communities in positive and signed networks. Gomez 
et al. .33] presented a generalization of the widely-used 
modularity [101 [14 a to allow for negative links. Traag 
and Bruggeman 35] extended the Potts model to incor¬ 
porate negative edges, resulting in a method similar to 
the clustering of signed graphs. These approaches focus 
on the problem of community detection and thus they 
inevitably suffer a devastating failure if the signed net¬ 
works comprise other structural patterns, for example 
the disassortative structure, as shown in Sec. IIV A| To 
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make matters worse, they simply give a “hard” partition 
of signed networks in which a specific vertex could be¬ 
long to one and only one cluster. Similar to the positive 
networks, we have good reason to believe that the signed 
networks also simultaneously include all kinds of meso¬ 
scopic structures that might overlap with each other. 

In this paper, we aim to capture and extract the intrin¬ 
sic mesoscopic structure of networks with both positive 
and negative links. This goal is achieved by dividing the 
vertices into groups such that the vertices within each 
group have similar positive and negative connection pat¬ 
terns to other groups. We propose a generalized stochas¬ 
tic block model, referred to as signed stochastic block 
model (SSBM), in which the group memberships of each 
vertex are represented by unobserved or hidden quan¬ 
tities, and the relationship among groups is explicitly 
characterized by two block matrices, one for the positive 
links and the other for the negative links. By using the 
expectation-maximization algorithm, we fit the model to 
the observed network structure and reveal the structural 
patterns without prior knowledge of what specific struc¬ 
tures existing in the network. As a result, not only can 
various unspecific structures be successfully found, but 
also their types can be immediately elucidated by the 
block matrices. In addition, the model parameters tell us 
the fuzzy group memberships and the centrality of each 
vertex, which enable us to discover the networks’ over¬ 
lapping structures and to identify two kind of important 
vertices, i.e., group core and bridge. Experiments on a 
number of synthetic and real world networks validate the 
effectiveness and the advantage of our model. 

The rest of this paper is organized as follows. We be¬ 
gin with the depictions of the mesoscopic structures, es¬ 
pecially the definitions of the community structure and 
disassortative structure, in signed networks in Sec. |TT] 
Then we introduce an extension of the stochastic block 
model in Sec. |III[ and show how to employ it to perform 
an exploratory analysis of a given network with both pos¬ 
itive and negative links. Experimental results on a series 
of synthetic networks with various designed structures 
and three social networks are given in Sec. |IV[ followed 
by the conclusions in Sec. [V] 


II. MESOSCOPIC STRUCTURES IN SIGNED 
NETWORKS 

It is well known that the mesoscopic structural pat¬ 
terns in positive networks can be roughly classified into 
the following two different types: “Assortative struc¬ 
ture” , usually called “community structure” in most 
cases, refers to groups of vertices within which connec¬ 
tions are relatively dense and between which they are 
sparser mm- In contrast, “disassortative structure”, 
also named “bipartite structure” or more generally “mul¬ 
tipartite structure”, means that network vertices have 
most of their connections outside their group ununmg. 

For a signed network, its mesoscopic structure is quite 


different from and much more complicated than that in 
a positive network since both the density and the sign 
of the links should be taken into account at the same 
time. The intuitive descriptions of the assortative struc¬ 
ture and disassortative structure given in Ref. mm are 
no longer suitable. A natural question arises: How can we 
characterize the mesoscopic structures in a network that 
has both positive and negative edges? Guidance can be 
provided by the social balance theory [5S], which states 
that the attitudes of two individuals toward a third per¬ 
son should match if they are positively related. In this 
situation, the triad is said to be socially balanced. A 
network is called balanced provided that all its triads are 
balanced. This concept can be further generalized to k- 
balance ESI SO] when the network can be divided into k 
clusters, each having only positive links within itself and 
negative links with others. 

Following the principle, we can reasonably describe 
the community structure in a signed network as a set 
of groups of vertices within which positive links are com¬ 
paratively dense and negative links are sparser, and on 
the contrary between which positive links are much looser 
and negative links are thicker [511155] . Obviously, it is an 
extension of the standard community structure in net¬ 
works with positive edges. In contrast, the disassortative 
structure can be defined as a collection of vertices that 
have most of their negative links within the group to 
which they belong while have majority of their positive 
connections outside their group. 

III. METHODS 
A. The SSBM Model 

Given a directed network G = (U, E), we can represent 
it by an adjacency matrix A. The entries of the matrix 
are defined as: Ay = 1 if a positive link is present from 
vertex i to vertex j. Ay- = — 1 if a negative link is present 
from vertex i to vertex j, and Ay = 0 otherwise. For 
weighted networks, Ay can be generalized to represent 
the weight of the link. We further separate the positive 
component from the negative one by setting A+ = Ay if 
Ay > 0 and 0 otherwise, and A~- = — Ay if Ay < 0 and 
0 otherwise, so A = A + — A - . 

Suppose that the vertices fall into c groups whose mem¬ 
berships are “hidden” or “missing” for the moment and 
will be inferred from the observed network structure. The 
number of groups c can also be inferred from the data, 
which will be discussed in Sec. |III C[ but we take it as a 
given here. The standard solution for such an inference 
problem is to give a generative model for the observed 
network structure and then to determine the parameters 
of the model by finding its best fit [ITj, E5H55] . 

The model we use is a kind of stochastic block model 
that parameterizes the probability of each possible con¬ 
figuration of group assignments and edges as follows (see 
Fig. 0 for a schematic illustration). Given an edge ey, 
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observe a positive edge ef- can be written as 

Pi(e±\u> + ,9,(/)) = ^u+Orifaj, (1) 

rs 

and the probability of observing a negative edge e” is 


FIG. 1: Stochastic block model for signed networks. Unfilled 
circles represent observed network structure and filled ones 
correspond to hidden memberships. The solid line between 
vertex i and j indicates the existence of one positive or nega¬ 
tive edge connecting them. The dashed line indicates that the 
relation between the corresponding quantities is unobserved 
and requires being learned from the observed network data. 

we choose a pair of group r and s for its tail and head 
with probability u;+ if e„ is positive, or with probability 
uj~ if eij is negative. The two scalars wf s and lo~ s giv¬ 
ing the probability that a randomly selected positive and 
negative edge from group r to s respectively, explicitly 
characterize various types of connecting patterns among 
groups, as we will see later. Then, we draw the tail ver¬ 
tex i from group r with probability 0 r i and the head 
vertex j from group s with probability (f) S j. Intuitively, 
the parameter 9 r i captures the centrality of vertex i in 
the group r from the perspective of outgoing edges while 
(p s j describes the centrality of vertex j in the group s 
from the perspective of incoming edges. The parameters 
, uj~ s , 9 r i and <p s j satisfy the normalization condition 

H U ™ = 1 ’ P 

r= 1 s=l r—1 s=l 

n n 

^ ^ @ri = 1 > ^ ^ fisj = 1 • 

i=l j =1 

Let I7 "ij and 1 f ^ to be respectively the group member¬ 
ship of the tail and head of the edge e^. So far, we 
have introduced all the quantities in our model: ob¬ 
served quantities {Aij}, hidden quantities {iff/, Vi?} 
and model parameters {wf,, oj~ s1 9 r t, To simplify 

the notations, we shall henceforth denote by w + the en¬ 
tire set {wf,} and similarly w - , 0, </>, *g and "if for {w“ s }, 
{9 r i}, {j}, {^g~ij} and {Vi?}- The probability that we 


Pr (%l^ ,M) = Y^^n^sj- (2) 

rs 

The marginal likelihood of the signed network, therefore, 
can be represented by 

Pr(A|w + , u~, 9, 4>) 

= ]^[ f Wr S 9 r i4>sj J f Y! ^rs^ri^sj J (3) 

ij ' rs rs ' 

Note that the self-loop links are allowed and the weight 
At and A~- are respectively viewed as the number of pos¬ 
itive and negative multiple links from vertex i to vertex 
j as done in many existing models }23li25j . 

To infer the missing group memberships V and if, we 
need to maximize the likelihood in Eq. (J3| with respect to 
the model parameters w + , u>~, 9 and </>. For convenience, 
one usually works not directly with the likelihood itself 
but with its logarithm 

= lnPr(A|o; + ,w _ ,0, (j>) 



The maximum of the likelihood and its logarithm occur in 
the same place because the logarithm is a monotonically 
increasing function. 

Considering that the group memberships V and are 
unknown, it is intractable to optimize the log-likelihood 
££ directly again. We can, however, give a good guess 
of the hidden variables V and if according to the net¬ 
work structure and the model parameters, and seek the 
maximization of the following expected log-likelihood 


2 = E Pr(V, t\A + ,u + } 9, </»)lnPr(A+| V, V> + , 0, </>) + Y Pr(V, t\A~,u~,e, ^)lnPr(A-|" <g, t, u~,9, </>) 

t,lt *9 


= Y Pt ^ 


S\eij,u + ,9,(l)) 


At (lnw+ + In 9 ri + In <j> a j) 


+ ^Pr(r,s|e^,w ,(?,</>) 


A tj (lnw rs + In 9 ri + In (f> sj ) 


= Y ^tjrs A tj ( lnw fs + ln °ri + In <j> sj ) + Y QijrsAj ( lnw rs + ln0 ri + hl0 sj ) , 


ijrs 


ijrs 


(5) 
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where qf jrs = Pr(Vij = r,~$ ij = s|e+., ui+, 9, </>) is the 
probability that one find a positive edge with its tail 
vertex i from group r and its head vertex j from group s 
given the network and the model parameters. Analogous 
interpretation can be made for q~j rs = Pi 'Eg ij = r i9 ij = 
s |e~-,w",0,</>) too. 

With the expected log-likelihood, we can get the best 
estimate of the value of Jz? together with the position of 
its maximum gives the most likely values of the model pa¬ 
rameters. Finding the maximum still presents a problem, 
however, since the calculation of qfj rs and q~j rs requires 
the values of w + , u >~, 9 and <f>, and vice versa. One 
possible solution is to adopt an iterative self-consistent 
approach that evaluates both simultaneously. Like many 
previous works S3 HM5], we utilize the expectation- 
maximization (EM) algorithm, which first computes the 
posterior probabilities of hidden variables using esti¬ 
mated model parameters and observed data (the E-step), 
and then re-estimates the model parameters (the M- 
step). 

In the E-step, we calculate the expected probabilities 
9ijrs an d Qijra §i ven the observed network A and param¬ 
eters lu + , oj~ , 9 and (j) 


Pr (V»j = s,e+.|a; + ,6>,((>) 

Pr(e+-|w+,6>,</>) 

^rs &sj 

Era ^rs9 r i4*sj 

Pr(V*i = = s,e~j\uj-,9,4i) 

^rs^rifisj 
Era LUrs9ri4 t sj 


( 6 ) 


In the M-step, we use the values of qfj rs and q~j ra es¬ 
timated in the E-step, to evaluate the expected log- 
likelihood and to find the values of the parameters that 
maximize it. Introducing the Lagrange multipliers p + , 
p ~, 7 r and A s to incorporate the normalization condi¬ 
tions, the expected log-likelihood expression to be maxi¬ 
mized becomes 


A? — A? + p + ( 1- ^ j + p f 1 — ^2, 


F. 7r ( 1 ® ri ) "P E ( ^ ) • (^) 


By letting the derivative of Jz? to be 0, the maximum of 


the expected log-likelihood appears at the places where 


= 


= 


V A + n + 

V* 4 + /y+ : 

Z-^ijrs ^ij^ijrs 

ij AijQijrs 
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A- n ~ 
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-u y A~a 
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ij Qijrs 
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ijr ^ij^ijrs 


( 8 ) 


Eq. § and <[8j) constitute our EM algorithm for ex¬ 
ploratory analysis of signed networks. When the algo¬ 
rithm converges, we obtain a set of values for hidden 
quantities qf ]rs , 9 ~- rs and model parameters w + , oj~ , 9 
and <f>. 

It is worthwhile to note that the EM algorithm are 
known to converge to local maxima of the likelihood but 
not always to global maxima. With different starting 
values, the algorithm may give rise to different solutions. 
To obtain a satisfactory solution, we perform several runs 
with different initial conditions and return the solution 
giving the highest log-likelihood over all the runs. 

Now we consider the computational complexity of the 
EM algorithm. For each iteration, the cost consists of two 
parts. The first part is from the calculation of <yj rs and 
q~j rs using Eq. (|6|), whose time complexity is 0{m x c 2 ). 
Here m is the edges in the network and c is the number 
of groups. The second part is from the estimation of the 
model parameters using Eq. ([8]), whose time complexity 
is also 0(m x c 2 ). We use T to denote the number of 
iterations before the iteration process converges. Then, 
the total cost of the EM algorithm for our model is 0(T x 
m x c 2 ). It is difficult to give a theoretical estimation 
to the number T of iterations. Generally speaking, T 
is determined by the network structure and the initial 
condition. 


B. Soft partition and overlapping structures 

The parameters, obtained by fitting the model to the 
observed network structure with the E-M algorithm, pro¬ 
vide us useful information for the mesoscopic structure in 
a given network. Specifically, the matrices w + and w - , an 
analogy with the image graph in the role model S3: char¬ 
acterize the connecting patterns among different groups, 
which determine the type of structural patterns. Fur¬ 
thermore, 9 and (f> indicate the centrality of a vertex in 
its groups from the perspective of outgoing edges and 
incoming edges, respectively. Consequently, the proba¬ 
bility of vertex i drawn from group r when it is the tail 
of edges can be defined as 

^ _ Ejsi^rs "P ^rs)®ri 

+ UJrs)9 r i 


( 9 ) 
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and vertex i can be simply assigned to the group r* to 
which it most likely belongs, i.e., r* = argma x r {ai r ,r = 
1,2,..., c}. The result gives a hard partition of the 
signed network. 

In fact, the set of scalars {ai r ]> =1 supply us with the 
probabilities that vertex i belongs to different groups, 
which can be referred to as the soft or fuzzy member¬ 
ships. Assigning vertices to more than one group have 
attracted by far the most interest, particularly in over¬ 
lapping community detection 0 IT6l - fT8| . The vertices be¬ 
longing to several groups, are found to take a special role 
in networks, for example, signal transduction in biologi¬ 
cal networks. Furthermore, some vertices, considered as 
“instable” [16] , locate on the border between two groups 
and thus are difficult to classify into any group. It is of 
great importance to reveal the global organization of a 
signed network in terms of overlapping mesoscopic struc¬ 
tures and to find the instable vertices. We employ here 
the bridgeness H3 and group entropy [2U] to capture 
the vertices’ instabilities and to extract the overlapping 
mesoscopic structure. These two measures of vertex i are 
computed as 


utilized in the previous generative models for network 
structure exploration l25l . 

According to MDL principle, the required length to 
describe the network data comprises two components. 
The first one describes the coding length of the net¬ 
work, which is —L for directed network and —L /2 
for undirected network. The other gives the length 
for coding model parameters that is — ]P rs lnw+ — 
J2rs ~ J2ri \n6 ri - J2sj for directed network 

and - Ylrs “ Ers “ E™ for undirected 
network. The optimal c is the one which minimizes the 
total description length. 


IV. EXPERIMENTAL RESULTS 

In this section, we extensively test our SSBM model on 
a series of synthetic signed networks with various known 
structure, including community structure and disassor- 
tative structure. After that, the method is also applied 
to three real-life social networks. 



^ ( CLr fo§c Q-ir- (11) 

r—1 

Note that vertex i has a large bridgeness bi and entropy & 
when it most likely participates in more than one group 
simultaneously and vice versa. From the perspective of 
incoming edges, we can represent the probability of ver¬ 
tex j belonging to group s by 

q _ Er(kVs A ^rs)&sj /io'\ 

Pjs — / .+ i -u ‘ 

2-jrsV^ rs fo ^rsjysj 

These statements for ai r also apply to : 8j s . So we don’t 
need to repeat again. 

The model described above focus on directed networks. 
Actually, the model could be easily generalized to undi¬ 
rected networks by letting the parameter 0 be identical 
to <f>. The derivation follows the case of directed networks 
and the results are the same to Eq. Q and 

C. Model selection 


A. Synthetic networks 

The ad hoc networks, designed by Girvan and New¬ 
man [12], have been broadly used to validate and com¬ 
pare community detection algorithms mmm- e y 
contrast, there exists no such benchmark for community 
detection in networks with both positive and negative 
links. We generate the signed ad hoc networks with con¬ 
trolled community structure by the method developed in 
Refs. [.‘141142] . The networks have 128 vertices, which are 
divided into four groups with 32 vertices each. Edges 
are placed randomly such that they are positive within 
groups and negative between groups, and the average de¬ 
gree of a vertex to be 16. The community structure is 
controlled by three parameters, p ln indicating the prob¬ 
ability of each vertex connecting to other vertices in the 
same group, p+ the probability of positive links appear¬ 
ing between groups, and p- the probability of negative 
links arising within groups. Thus, the parameter p- m reg¬ 
ulates the cohesiveness of the communities and the re¬ 
maining parameters p+ and p- add noise to the commu¬ 
nity structure when pi n is fixed. 

For the synthetic networks, we simply consider their 
hard partition as defined in Sec. |IIIB| The results 
are evaluated by the normalized mutual information 
(NMI) [43], which can be formulated as 


So far, our model assumes that the number of groups 
c is known as a prior. This information, however, is un¬ 
available for many cases. It is necessary to provide a 
criterion to determine an appropriate group number for 
a given network. Several methods have been proposed to 
deal with this model selection issue. We adopt the min¬ 
imum description length (MDL) principle, which is also 


E E ».,i» 


NMI(Ci, C 2 ) = 


i=lj=l 


//V' (!) i nf’wv' ( 2 )l n i 2 \ 

'(E«J ln ur)(E n \ ln ur) 


2=1 


2=1 


where C\ and C 2 are the true group assignment and the 
assignment found by the algorithms, respectively, n is the 
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(a) 



(b) 



FIG. 2: (Color online) NMI of our method and other ap¬ 
proaches on balanced ad-hoc networks with controlled com¬ 
munity structure (a) and disassortative structure (b). Each 
point is an average over 50 realizations of the networks. 


number of vertices, n t j is the number of vertices in the 
known group i that are assigned to the inferred group j, 
rJp is the number of vertices in the true group i, rSp 
is the number of vertices in the inferred group j. The 
larger the NMI value, the better the partition obtained 
by the algorithms. 

We conduct two different experiments. First, we set 
the two parameters p+ and p_ to be zero and gradually 
change p m from 1 to 0. In this situation, all the gener¬ 
ated synthetic networks are 4-balanced. Kig. [2ja) reports 
the experimental results obtained by our method and two 
state-of-the-art approaches, namely generalized modular¬ 
ity maximization through simulated annealing (denoted 
by GMMax) [.151 ; 551 and the finding and extracting com¬ 
munity (FEC) method [31]. In addition, we also imple¬ 
ment the simulated annealing algorithm to maximize the 
standard modularity by ignoring the sign of the links 
(denoted by MMax) and removing the negative edges 
(denoted by PMMax), respectively. Each point in the 
curves is an average over 50 realization of the synthetic 
random networks. Bear in mind that the community 
structure becomes less cohesive as the parameter pi n de¬ 
creases from 1 to 0. We can see that both the SSBM 
model and the GMMax method perform fairly well and 
are almost able to perfectly recover the communities in 
the synthetic networks for all cases. When 0 < p\ n <0.1, 
our model is even slightly superior to the GMMax ap¬ 
proach. The remaining three methods, however, can only 
achieve promising results when p- ln is sufficiently large. 
They all show a fast deterioration as pi n becomes smaller 


and smaller. For example, the NMI of the FEC algorithm 
begins to drop once pi n exceeds 0.8, and then quickly re¬ 
duces to less than 0.2 when p ln = 0.5 and even to approx¬ 
imately 0 when p- m is smaller than 0.3. Similar perfor¬ 
mances can be observed for the MMax and PMMax ap¬ 
proaches as well. These results are quite understandable 
since both the SSBM model and the GMMax method 
consider the contribution made by the negative links in 
signed networks, which is either neglected or removed in 
the remaining three approaches. This highlights the im¬ 
portance of the negative edges for community detection 
in the signed networks. Moreover, the PMMax method 
always outshines the MMax method, especially when p ln 
in the range 0 < p m < 0.5, which is in agreement with 
the results reported in Ref. [T2] , indicating that the posi¬ 
tive links in signed networks have a significant impact on 
community detection. 

Then, we fix the parameter p ln = 0.8 and gradually 
change other two parameters p+ and p- from 0 to 0.5, 
respectively. Clearly, all the synthetic networks are not 
balanced in this setting. The results obtained by our 
model and two updated algorithms are give in the up¬ 
per row of Fig [3] As we can see, the SSBM model con¬ 
sistently, and sometimes significantly, outperforms the 
other two approaches. More specifically, its NMF is al¬ 
ways 1 expect for a few negligible perturbations. By con¬ 
trast, the FEC algorithm cannot offer a satisfactory par¬ 
tition of the signed networks when 0 < p+ < 0.3 and 
0 < p- < 0.5, whose NMI is less than 0.4 at all times. 
When 0.3 < p + < 0.5 and 0 < p_ < 0.5, the GMMax ap¬ 
proach exhibits a competitive performance, but its NMI 
suddenly collapses and continuously decreases once p+ is 
larger than 0.3. 

We turn now to the second experiment in which the 
synthetic networks have the controlled disassortative 
structure. The signed networks are generated in the 
same way, expect that we randomly place negative links 
within groups and positive links between groups. Simi¬ 
larly, the disassortative structure in these networks are 
controlled by three parameters again. p m indicates the 
probability of each vertex connecting to other vertices in 
the same group, p+ the probability of positive links ap¬ 
pearing within groups, and p- the probability of negative 
links arising between groups. 

We first study the balanced networks by setting p + 
and p- to be zero and changing p- m from 1 to 0 once 
again. As shown in Fig. the FEC algorithm, the 

MMax method and our model achieve the performances 
that is very similar to those in the first experiment. That 
is, our model always successfully find the clusters in the 
synthetic networks for all the cases, while the FEC algo¬ 
rithm and the MMax method perform fairly well when p- m 
is large enough, but quickly degrade as p ln approaches 0. 
The PMMax and the GMMax methods, however, per¬ 
form rather badly. The NMI of the PMMax method 
seems no greater than 0.5 even if p ln = 1, while the 
NMI of the GMMax approach nearly vanishes for all the 
cases. This is because the two methods, which seek stan- 
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FIG. 3: NMI on unbalanced ad-hoc networks with controlled community structure (first row) for (a) FEC, (b) GMMax and 
(c) SSBM, and with controlled disassortative structure (second row) for (d) FEC, (e) GMMin and (f) SSBM. Each point is an 
average over 50 realizations of the networks. 


dard and generalized modularity maximization, respec¬ 
tively, are suitable only for community detection. As 
a consequence, they deserve to suffer a serious failure 
in this experiment. Instead, one should minimize the 
modularity to uncover the multipartite structure in net¬ 
works, as indicated in Ref. m- Therefore, we apply 
the simulated annealing algorithm to minimize the gen¬ 
eralized modularity (denoted by GMMin) and the stan¬ 
dard modularity by ignoring the sing of links (denoted 
by MMin) and excluding the negative connections (de¬ 
noted by PMMin), respectively. We see from Fig. [2](b) 
that the GMMin method can obtain competitive perfor¬ 
mance with our SSBM model expect for a slight inferior 
when 0 < p; n <0.1. However, the MMin and the PMMin 
approaches perform unsatisfactorily due to the fact that 
they do not consider the contributions derived from the 
negative links. 

We investigate next the disassortative structure in un¬ 
balanced synthetic networks by fixing p- la = 0.8 and 
changing p + and p_ from 0 to 0.5 step by step. The 
lower row of Fig.[3]gives the results obtained by the FEC 
method, the GMMin approach and our SSBM model, 
which are quite similar to those in the first experiment. 
In particular, although the SSBM does not perform per¬ 
fectly in some cases, its NMF is still rather high, say, 
more than 0.98. When 0 < p~ < 0.3, the GMMin ap¬ 
proach yields sufficiently good results, but its NMF re¬ 
duces at a very fast speed along with p_ toward 0.5. 
The FEC algorithm achieves the worst performance in 


all cases. 

Finally, we focus on a synthetic network containing 
a multitude of mesoscopic structures, whose adjacency 
matrix is given in Fig. |4](a). Intuitively, according to 
the outgoing edges in this network, the second group is 
the community structure and the third group belongs to 
the disassortative structure. The first group with posi¬ 
tive outgoing links only, can be viewed as an example of 
the standard community structure in positive networks, 
while the last group, which includes only negative outgo¬ 
ing links, can be referred to as an extreme example of the 
disassortative structure in signed networks. Meanwhile, 
from the perspective of incoming edges, the four groups 
exhibit different types of structural patterns, which can¬ 
not be categorized simply as community structure or dis¬ 
assortative structure. We apply the FEC algorithm, the 
GMMax method, the GMMin method and our model to 
this signed network. Limited by their intrinsic assump¬ 
tions, the FEC algorithm, the GMMax method and the 
GMMin method fail to uncover the structural patterns, 
as shown in Fig. |4](b)-(d). In particular, the general¬ 
ized modularity proposed in Refs. [351 [36], regardless of 
whether it is maximum or minimum, misleads us into 
receiving an improper partition of the network in which 
the four groups merge with each other. But by dividing 
vertices with the same connection profiles into groups, 
our model could accurately detect all types of mesoscopic 
structures, both from the perspective of outgoing links 
(Fig-0 e)) and from the perspective of incoming edges 
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FIG. 4: (Color online) Detecting the mesoscopic structure of a synthetic network, (a) The adjacency matrix of the signed 
network where the black dots denote the positive links and the gray dots represent the negative edges. The partitioning results 
for different methods (b) EFC, (c) GMMax, (d) GMMin and SSBM from the perspective of outgoing edges (e) and incoming 
edges (f), where the solid edges denote the positive links and the dashed edges represent negative links. The sizes of the vertices 
in (e) and (f) indicate their centrality degree in the corresponding groups according to the parameters 9 and 0, respectively. 


2 75* 10< ^ ^ 

2.751 - - - ' n jz - ' ' ' ' 

1 2.7 ■ I> 340 ■ 

iLuIeILjm 


2.55 



3456789 10 

Group number c 


FIG. 5: Model selection results for (a) the Slovene Parliamen¬ 
tary network, (b) the Gahuku-Gama Subtribes network and 
(c) the international conflict and alliance network. 


(Fig. |4^f)). Furthermore, the obtained parameters 0 and 
(j) reveal the centrality of each vertex in its corresponding 
group from the two perspectives. 


B. Real-life networks 

We further test our method by applying it to several 
real networks containing both positive and negative links. 
The first network is a relation graph of 10 parties of 
the Slovene Parliamentary in 1994 [44] , The weights of 
links in the network were estimated by 72 questionnaires 
among 90 members of the Slovene National Parliament. 
The questionnaires were designed to estimate the dis¬ 
tance of the ten parties on a scale from -3 to 3, and the 
final weights were the averaged values multiplied by 100. 

We further test our method by applying it to several 
real networks containing both positive and negative links. 
The first network is a relation graph of 10 parties of 
the Slovene Parliamentary in 1994 )44j . The weights of 
links in the network were estimated by 72 questionnaires 
among 90 members of the Slovene National Parliament. 
The questionnaires were designed to estimate the dis¬ 
tance of the ten parties on a scale from -3 to 3, and the 
final weights were the averaged values multiplied by 100. 

Applying our model to this signed network, we find 
that the MDL achieves its minima when c = 2, as shown 
in Fig. [sja), indicating that there are exactly two com¬ 
munities in the network. Fig. ®a) gives the partition 
obtained by our method, which divides the network into 
two groups of equal size and produces a completely con- 
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TABLE I: The soft group membership a, bridgeness Mia and group entropy [20] of each vertex in the Slovene Parliamentary 
network [42] ■ Larger bridgeness or entropy means that the corresponding node are more “instable”. 


Vertex 

SKD 

ZLSD SDSS 

LDS 

ZS-ESS 

ZS 

DS 

SLS 

SPS-SNS 

SNS 

an 

1.000 

0 

1.000 

0 

0 

1.000 

0 

1.000 

1.000 

0.0186 

ai 2 

0 

1.000 

0 

1.000 

1.000 

0 

1.000 

0 

0 

0.9814 

bi 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0.0372 

& 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0.1334 


TABLE II: The soft group membership a, bridgeness K El and group entropy [20] of each vertex in the Gahuku-Gama 
Subtribes network [45] . Larger bridgeness or entropy means that the corresponding node are more “instable”. 


Vertex 

GAVEV 

KOTUN 

OVE 

ALIKA 

NAGAM GAHUK MASIL UKUDZ NOTOH KOHIK 

Oil 

1.000 

1.000 

0 

0 

0 

0 

0 

0 

0 

0 

OLi2 

0 

0 

1.000 

1.000 

0 

1.000 

0.7143 

1.000 

0 

0 

OLiZ 

0 

0 

0 

0 

1.000 

0 

0.2857 

0 

1.000 

1.000 

h 

0 

0 

0 

0 

0 

0 

0.3773 

0 

0 

0 


0 

0 

0 

0 

0 

0 

0.5446 

0 

0 

0 

Vertex GEHAM 

ASARO 

UHETO 

SEUVE 

NAGAD 

GAMA 





an 

0 

0 

0 

0 

1.000 

1.000 





Qi2 

1.000 

1.000 

0 

0 

0 

0 





Oi3 

0 

0 

1.000 

1.000 

0 

0 





bi 

0 

0 

0 

0 

0 

0 





& 

0 

0 

0 

0 

0 
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FIG. 6: Exploratory analysis of the Slovene Parliamentary 
network [44]. The solid edges denote the positive links and 
the dashed edges represent negative links. The true commu¬ 
nity structure in this network is represented by two different 
shapes, circle and square. The shades of nodes indicate the 
membership a obtained by fitting our model to this network. 
The sizes of the vertices, proportional to 6, indicates their 
centrality degree with respect to their corresponding group. 


sistent split with the true communities in the network. 
As expected, vertices within the same community are 
mostly connected by positive links while vertices from 
different communities are mainly connected by negative 
links. We shade each vertex proportional to the pa¬ 
rameters {ai r } y=i, the magnitude of which supplies us 
with the probabilities of each vertex belonging to differ¬ 


ent groups. 1 From Table |I[ we see that all the vertices 
can be exclusively separated into two communities, ex¬ 
pect for the vertex “SNS” which belongs to the circle 
group with probability 0.0186 and to the square group 
with probability 0.9814. In other words, the two commu¬ 
nities overlap with each other at this vertex, resulting in 
its high bridgeness of 0.0372 and group entropy of 0.1334. 
This is validated by the observation that the vertex has 
two negative links with vertices “ZS-ESS” and “DS” in 
the same community. We also visualize the learned pa¬ 
rameters u> + and u>~ in Fig. §b), which indeed provide 
a coarse-grained description of the signed network and 
reveal that this network actually has two communities. 

The second network is the Gahuku-Gama Subtribes 
network, which was created based on Read’s study 
on the cultures of Eastern Central Highlands of New 
Guinea [45]. This network describes the political al¬ 
liance and enmities among the 16 Gahuku-Gama sub¬ 
tribes, which were distributed in a particular area and 
were engaged in warfare with one another in 1954. The 
positive and negative links of the network correspond to 
political arrangements with positive and negative ties, 
respectively. Fig. [5jb) tells us that this signed network 
consists of three groups because the MDL of the SSBM 
model is minimum when c = 3. The three groups cat¬ 
egorized by our model are given in Fig. [7][a), and they 


1 This network as well as the Gahuku-Gama Subtribes network are 
both undirected graph, and therefore the parameter a is identical 
to /3, and 9 is identical to O. 
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FIG. 7: (Color online) Exploratory analysis of the Gahuku- 
Gama Subtribes network [45] ■ The solid edges denote the 
positive links and the dashed edges represent negative links. 
The true community structure in this network is represented 
by three different shapes while the inferred groups are denoted 
by different colors. The sizes of the vertices are proportional 
to the parameters 6. 


match perfectly with the true communities in the signed 
network. As shown in Table [TT|, the vertex “MASIL” par¬ 
ticipates in the circle group with probability 0.7143 and 
in the square group with probability 0.2857. As a result, 
it has a large value of bridgeness 0.3773 and group en¬ 
tropy 0.5446. This implies that these two groups overlap 
with each other at this vertex, which is approved by the 
fact that the vertex “MASIL” has two positive links con¬ 
nected to “NAGAM” and “UHETO”, respectively. The 
learned parameters uj + and uj~ supply us with a thumb¬ 
nail of the signed network again in Fig. Htb). 

Finally we test our model on the network of interna¬ 
tional relation taken from the Correlates of War data 
set over the period 1993—2001 [36j. In this network, 
positive links represent military alliances and negative 
links denote military disputes. The disputes are asso¬ 
ciated with three hostility levels, from “no militarized 
action” to “interstate war”. For each pair of countries, 
we chose the mean level of hostility between them over 
the given time interval as the weight of their negative 
link. The positive links denote the alliances: 1 for en¬ 
tente, 2 for non-aggression pact and 3 for defence pact. 
Finally, we normalized both the negative links and pos¬ 
itive links into the interval [0, 1] and the final weight of 
the link among each pair of countries is the remainder 
of the weight of the normalized positive links subtracting 
the weight of the normalized negative links. The ob¬ 
tained network contains a giant component consisting of 
161 vertices (countries) and 2517 links (conflicts or al¬ 
liances). Here, we only investigate the structure of the 
giant component. 

The structure of this network has been investigated 


in several existing studies. These studies indicated that 
there are six main power blocs, each consisting of a set 
of countries with similar actions of alliances or disputes. 
In Ref. (36], the authors labeled these power blocs as (i) 
The West, (ii) Latin America, (iii) Muslim World, (iv) 
Asia, (v) West Africa, and (vi) Central Africa. Apply¬ 
ing the SSBM model to this network, we find that the 
MDL arrives its minimum when c = 6, as illustrated in 
Fig. [5jc). By partitioning the network into six groups, 
we summarize the results in Fig. [8] From the rearranged 
adjacency matrix [Fig. [8](c)], we can conclude that the 
first, second, third and fifth groups, from bottom left to 
top right, distinctly belong to the community structure, 
while the sixth group can be viewed as the disassortative 
structure. However, the fourth group cannot be simply 
categorized as either community structure or disassor¬ 
tative structure. In agreement with the assumption of 
the SSBM model, vertices in the six groups exhibit the 
similar connection profiles, although the miscellaneous 
structural patterns coexist in this network. 

From the perspective of the outgoing edges, we ob¬ 
tain a split of the network that is similar to the one 
got in Ref. [36], as shown in Fig. [8](a). However, sev¬ 
eral notable difference exists between the two results. 
Specifically, “Pakistan” is grouped with the West and 
“South Korea” is grouped with the Muslim World in 
Ref. [36]. These false categorizations can be correctly 
amended, which is consistent with the configuration de¬ 
picted in Huntington’s renowned book The Clash of Civ¬ 
ilizations [46] . In addition, we categorized “Australia”, 
which is grouped with West in Ref. [36 , into the group 
Asia for understandable reasons. Fig. [8 [b) gives a quite 
different structure of this network from the perspective of 
incoming edges. Three groups, namely the West, Latin 
America and Muslim World, stay almost the same. But 
“Russia”, together with some countries of the former So¬ 
viet Union, are isolated from the Asia group and form an¬ 
other independent power bloc. Meanwhile, the remaining 
countries in Asia group join with the West Africa coun¬ 
tries to constitute a bigger cluster. It is not difficult to 
see that all the changes appear to be in accordance with 
the history and evolution of the international relations. 

Recall that the parameters 6 and provide us with 
the centrality degrees of each vertex in its corresponding 
group from the perspective of outgoing edges and incom¬ 
ing edges, respectively. In other words, the parameters 
measure the importance of each vertex in its group. For 
a better visualization, the sizes of vertices in Fig. [8jd) 
and (e) are proportional to the magnitude of the scalars 
6 and <fi. Coincidentally, we discover that the big ver¬ 
tices, marked by the red bold border, usually stand for 
the dominant countries in their corresponding groups. 
For example, the largest vertex of the West is “USA” in 
Fig. [8](d). In fact, this state often serves as a leader in 
its power bloc. A similar interpretation can be given for 
the vertex “Russia” in Asia group. We further check the 
bridgeness and group entropy for each vertex in the net¬ 
work (data not shown), and we mark the vertices, which 
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FIG. 8: (Color online) Exploratory analysis of the international conflict and alliance network [36j . Maps of the groups found 
using the SSBM model from the perspective of outgoing edges (a) and incoming edges (b). (c) The rearrange adjacency matrix, 
in which the black dots denote positive links and the gray dots represent negative edges, respectively. Six groups are separated 
by black solid lines. The partition of this network obtained by the SSBM model from the perspective of outgoing edges (d) and 
incoming edges (e), where the solid edges denote the positive links and the dashed edges represent negative links. The sizes of 
vertices are respectively proportional to their centrality degree 9 and (p. The red bold border vertices have the large centrality 
degrees while the black bold border vertices have the large values of bridgeness and group entropy. 


have large values of these two measures, with the black 
bold border. As anticipated, these kinds of vertices are 
particularly prone to reside on the boundaries of differ¬ 
ent groups. That is to say, the vertices that are very 
difficult to divide into one group build a fuzzy watershed 
of the overlapping structures. In Fig. [8jb), three vertices 
“Janpan” , “Philippines” and “Australia”, with high val¬ 
ues of bridgeness and group entropy, play a transitional 
role between the West and Asia groups. In reality, the 
above-mentioned Asian counties frequently collaborated 
with the counterparts in West group in many areas, from 
economics to military. 


V. CONCLUSIONS 

We propose an extension of the stochastic block model 
to study the mesoscopic structural patterns in signed net¬ 
works. Without prior knowledge what specific structure 
exists, our model can not only accurately detect broad 
types of intrinsic structures, but also can directly learn 
their types from the network data. Experiments on a 
number of synthetic and real world networks demon¬ 
strate that our model outperforms the state-of-the-art 
approaches at extracting various structural features in a 
given network. Due to the flexibility inherited from the 
stochastic model, our method is an effective way to re- 
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veal the global organization of the networks in terms of 
the structural regularities, which further helps us under¬ 
stand the relationship between networks’ structure and 
function. As future work, we will generalize our model 
by releasing the requirement that the block matrices are 
square matrices and investigate the possible applications 
of the more flexible models. 
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